An Extensible Framework for Data Cleaning

نویسندگان

  • Helena Galhardas
  • Daniela Florescu
  • Dennis Shasha
  • Eric Simon
چکیده

We propose an extensible data cleaning tool, named AJAX, that supports the specification and efficient execution of complex data cleaning programs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research and Realization of the Extensible Data Cleaning Framework

This paper proposes the idea of establishing an extensible data cleaning framework which is based on the key technology of data cleaning, and the framework includes open rules library and algorithms library. This paper gives the descriptions of model principle and working process of the extensible data cleaning framework, and the validity of the framework is verified by experiment. When the dat...

متن کامل

Declarative Support for Sensor Data Cleaning

Pervasive applications rely on data captured from the physical world through sensor devices. Data provided by these devices, however, tend to be unreliable. The data must, therefore, be cleaned before an application can make use of them, leading to additional complexity for application development and deployment. Here we present Extensible Sensor stream Processing (ESP), a framework for buildin...

متن کامل

XML based Framework for ETL Processes For Relational Databases

In Data Warehousing, Extraction-Transformation-Loading (ETL) are the key tasks that are responsible for the extraction of data from several sources, their cleansing, customization and insertion into data warehouse [10]. More specifically ETL tools are category of specialized tools with the task of dealing with data warehouse cleaning and loading problems. These task are very critical in every d...

متن کامل

TAILOR: A Record Linkage Tool Box

Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...

متن کامل

DeepDetect: An Extensible System for Detecting Attribute Outliers & Duplicates in XML

XML, the eXtensible Markup Language, is fast evolving into the new standard for data representation and exchange on the WWW. This has resulted in a growing number of data cleaning techniques to locate “dirty” data (artifacts). In this paper, we present DeepDetect – an extensible system that detects attribute outliers and duplicates in XML documents. Attribute outlier detection finds objects tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000